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Abstract 



Gaussian random fields (GRFs) constitute an important part of spatial modelling, but 
can be computationally infeasible for general covariance structures. An efficient approach is 
to specify GRFs via stochastic partial differential equations (SPDEs) and derive Gaussian 
Markov random field (GMRF) approximations of the solutions. We consider the construction 
of a class of non-stationary GRFs with varying local anisotropy, where the local anisotropy is 
introduced by allowing the coefficients in the SPDE to vary with position. Specifically, using 
a form of diffusion equation driven by Gaussian white noise with a diffusion matrix that 

<^ varies with position. This allows for the introduction of parameters that control the GRF 

by parametrizing the diffusion matrix. These parameters and the GRF may be considered 
to be part of a hierarchical model and the parameters estimated in a Bayesian framework. 
The results show that the use of an SPDE with non-constant coefficients is a useful way 
of creating non-stationary spatial GMRFs that allows for physical interpretability of the 
I parameters. 

^ Keywords: Non-stationary, Spatial, Gaussian random fields, Gaussian Markov random 

ON fields, Anisotropy, Bayesian 
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^ 1 Introduction 

Many spatial models for continuously indexed phenomena, such as temperature, precipitation 
and air pollution, are based on Gaussian random fields (GRFs). This is mainly due to the fact 
that their theoretical properties are well understood and that their distributions can be fully 
^ described by mean and covariance functions. In principle, it is enough to specify the mean at 

each location and the covariance between any two locations. However, specifying covariance 
functions is hard and specifying covariance functions that can be controlled by parameters in 
useful ways is even harder. This is the reason why the covariance function usually is selected from 
a class of known covariance functions such as the exponential covariance function, the Gaussian 
covariance function or the Matcrn covariance function. 

But even when the covariance function is selected from one of these classes, the feasible 
problem sizes are severely limited by a cubic increase in computation time as a function of the 
number of observations and prediction locations. This computational challenge is usually tackled 
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cither by reducing the dimensionality of the problem (Cressie and Johannesson, 2008 Banerjee et 



al., 2008 1, by introducing sparsity in the precision matrix (Rue and Held, 2005 ) or the covariance 



matrix (Furrer et al., 2006), or by using an approximate likelihood (Stein et al., 2004 Fuentes, 



2007). Sun et al. (2012) offers comparisons of the advantages and challenges associated with the 



usual approaches to large spatial datasets. 

The main goal of this paper is to explore a new class of non-stationary GRFs that provide both 
an easy way to specify the parameters and allows for fast computations. The main computational 



tool used is Gaussian Markov random fields (GMRFs) (Rue and Held, 2005) with a spatial 
Markovian structure where each position is conditionally dependent only on positions close to 
itself. The strong connection between the Markovian structure and the precision matrix results 
in sparse precision matrices that can be exploited in computations. The main problem associated 
with such an approach is that GMRFs must be constructed through conditional distributions, 
which presents a challenge as it is generally not easy to determine whether a set of conditional 
distributions gives a valid joint distribution. Additionally, the conditional distributions have to 
be controlled by useful parameters in such a way that not only the joint distribution is valid, 
but also such that the effect of the parameters is understood. Lastly, it is desirable that the 
GMRF is a consistent approximation of a GRF in the sense that when the distances between 
the positions decrease, the GMRF "approaches" a continuous GRF. These issues are even more 
challenging for non-stationary GMRFs. It is is extremely hard to specify the non-stationarity 
directly through conditional distributions. 

There is no generally accepted way to handle non-stationary GRFs, but many approaches 
have been suggested. There is a large literature on methods based on the deformation method 



of Sampson and Guttorp (1992), where a stationary process is made non-stationary by deform- 
ing the space on which it is defined. Several Bayesian extensions of the method have been 



proposed (Damian et al., 2001 2003 Schmidt and O'Hagan, 2003 Schmidt et al., 2011), but 



all these methods require replicated realizations which might not be available. There has been 
some development towards an approach for a single realization, but with a "densely" observed 
realization (Anderes and Stein, 2008). Other approaches use kernels which are convolved with 



Gaussian white noise (Higdon, 1998 Paciorek and Schervish, 2006), weighted sums of station- 
ary processes (Fuentes, 20011 and expansions into a basis such as a wavelet basis (|Nychka~et 



al., 2002). Conceptually simpler methods have been made with "stationary windows" (Haas, 



1990b 1990a) and with piecewise stationary Gaussian processes (Kim et al., 20051. There has 



also been some progress with methods based on the spectrum of the processes (Fuentes, 2001 



2002a| 2002b). Recently, a new type of method based on a connection between stochastic 



partial differential equations (SPDEs) and some classes of GRFs was proposed by Lindgren et 



al. (2011 ). They use an SPDE to model the GRF and construct a GMRF approximation to the 
GRF for computations. An application of a non-stationary model of this type to ozone data can 



be found in Bolin and Lindgren (2011) and an application to precipitation data can be found 
in Ingebrigtsen et al. (20131. 

This paper extends on the work of Lindgren et al. (201 1[ ) and explores the possibility of 
constructing a non-stationary GRF by varying the local anisotropy. The interest lies both in 
considering the different types of structures that can be achieved, and how to parametrize the 
GRF and estimate the parameters in a Bayesian setting. The construction of the GRF is based 
on an SPDE which describes the GRF as the result of a linear filter applied to Gaussian white 
noise. Basically, the SPDE expresses how the smoothing of the Gaussian white noise varies 
at different locations. This construction bears some resemblance to the deformation method 



of Sampson and Guttorp (1992 ) in the sense that parts of the spatial variation of the linear filter 
can be understood as a local deformation of the space, only with an associated spatially varying 
variance for the Gaussian white noise. The main idea for computations is that since this filter 



works locally, it implies a Markovian structure on the GRF. This Markovian structure can be 
transferred to a GMRF which approximates the GRF, and in turn fast computations can be 
done with sparse matrices. 

This paper presents a first look into a new type of models and the main goal is to explore 
what can be achieved in terms of models and inference with the models. Section [2] contains 
the motivation and introduction to the class of non-stationary GRFs that is studied in the 
other sections. The form of the SPDE that generates the class is given and it is related to 
more standard constructions of GMRFs. In Section [3] illustrative examples are given on both 
stationary and non-stationary constructions. This includes some discussion on how to control 
the non-stationarity of the GRF. Then Section [4] explores parameter estimation for these types 
of models through different examples with simulated data. The paper ends with discussion and 
concluding remarks in Section [5] 

2 New class of non- stationary GRFs 

A GMRF u is usually parametrized through a mean /j, and a precision matrix Q such that 
u ~ Af(fi, Q _1 ). The main advantage of this formulation compared to the usual parametrization 
of multivariate Gaussian distributions through the covariance matrix is that the Markovian 



structure is represented in the non-zero structure of the precision matrix Q (Rue and Held, 



2005). Off-diagonal entries are non-zero if and only if the corresponding elements of u arc 



conditionally independent. This can be seen from the conditional properties of a GMRF, 

E(ui\u-i) = fj,i- — — ^ Qi,o( u i ~ Mi) 

and 

Var(uj|u_j) = — — , 

where U-i denotes the vector u with clement i deleted. From these conditional properties one 
can see that a Markovian spatial structure requires Qij to be zero for locations i and j that are 
not close to each other, but it is not clear what values should be given to the non-zero elements of 
the precision matrix. This is the framework of the conditionally auto-regressive (CAR) models, 



whose conception predates the advances in modern computational statistics (Whittle, 1954 
|Besag, 1974] ). In the multivariate Gaussian case it is clear that the requirement for a valid joint 
distribution is that Q is positive definite, which is not an easy condition to check. 

Specification of a GMRF through the conditional properties given above is usually done in a 
somewhat ad-hoc manner. For regular grids, a process such as random walk can be constructed 
and the only major issue is to get the conditional variance correct as a function of step-length. For 
irregular grids the situation is not as clear because each of the conditional means and variances 



must depend on the varying step-lengths. In Lindgren and Rue (2008 ) it is demonstrated that 



some such constructions for second-order random walk can lead to inconsistencies as new grid 

points are added, and they offer a surprisingly simple construction for second-order random walk 

based on the SPDE 

d 2 

where a > and W is standard Gaussian white noise. If the precision matrix is chosen according 
to their scheme one does not have to worry about scaling as the grid is refined, as it automatically 
approaches the continuous second-order random walk. There is an automatic procedure to select 
the form of the conditional means and variances. 



One-dimensional second-order random walk is a relatively simple example of a process with 
the same behaviour everywhere. To approximate a two-dimensional, non-stationary GRF, a 
scheme would require (possibly) different anisotropy and correct conditional variance at each 
location. To select the precision matrix in this situation poses a large problem and there is 
abundant use of simple models such as a spatial moving average 

®( u i,j\ U -{(i,j)}) = j( u i-l,j + "i+lj + u i,j-l + UiJ+l) 

with a constant conditional variance 1/a. There are ad-hoc ways to extend such a scheme to a 
situation with varying step-lengths in each direction, but little theory for more irregular choices 
of locations. 

This is why the choice was made to start with the close connection between SPDEs and some 



classes of GRFs that was presented in Lindgren et al. (2011 ), which is not plagued by the issues 



above. From |Whittle (1954[ ) it is known that the SPDE 

(k 2 -A)u{s) = W(s), sel 2 , (1) 

where n > and A = -^ + #^ is the Laplacian, gives rise to a GRF u with the Matern 
covariance function 

r(«) = -^(K|MI)tfi(«NI), 

where K\ is the modified Bessel function of the second kind of order 1. The intriguing part, 
that Lindgren et al. (2011) expanded upon, is that (k 2 — A) can be interpreted as a linear filter 



acting locally. This means that if the continuously indexed process u were instead represented 
by a GMRF u on a grid or a triangulation, one could replace this operator with a matrix, say 
B(k 2 ), only involving neighbours of each location such that Equation ([I]) becomes approximately 

B(K 2 )u~Af(0,I). (2) 

The matrix B(re 2 ) depends on the chosen grid, but after the relationship is derived, the calculation 
of B(k 2 ) is straightforward for any k 2 . Since B(k 2 ) is sparse, the resulting precision matrix 
Q(k 2 ) = B(k 2 ) t B(k 2 ) for u is also sparse. This means that by correctly discretizing the operator 
(or linear filter), it is possible to devise a GMRF with approximately the same distribution as 
the continuously indexed GRF. And because it comes from a continuous equation one does not 
have to worry about changing behaviour as the grid is refined. 

The class of models that are studied in this paper is the one that can be constructed from 
Equation ([I]), but with anisotropy added to the A operator. A function H, that gives 2x2 positive 
definite matrices at each position, is introduced and the operator is changed to V • H(s)V = 

Si 7 w^ I hi,j{ s )-j&: ) ■ This basically gives different degrees of smoothing in different directions, 
which results in a range that varies with direction at all locations. The SPDE becomes 

(k 2 -V-H(s)V)u(s)=W(s), seV= [Ai,Bi] x [A 2 ,B 2 ] CE 2 . (3) 

Both for interpretation and for practical use of the equation above it is useful to decompose 
H into scalar functions. The anisotropy due to H is decomposed as 

H(s) = 7 I 2 + v(s)v{s) T , 

where 7 specifies the isotropic, baseline effect and the vector field v(s) = [v x (s), Vy(s)} 1 - specifies 
the direction and magnitude of the local, extra anisotropic effect at each location. In this way, one 



can, loosely speaking, think of different Matern like fields locally each with its own anisotropy 
that are combined into a full process. An example of an extreme case of a process with a 
strong local anisotropic effect is shown in Example |3.2| The example shows that there is a close 
connection between the vector field and the resulting covariance structure of the GRF. 

The main computational challenge is to determine the appropriate discretization of the SPDE 
in Equation pi), that is how to derive a matrix B such as in Equation (pi). The idea is to look to 
the field of numerics for discretization methods for differential equations. Then combine these 
with properties of Gaussian white noise. Namely, that for a Lebesgue measurable subset A of 
M. n , for some n > 0, 



/. 



W{s)ds~Af(0,\A\), 



where \A\ is the Lebesgue measure of A, and that for two disjoint Lebesgue measurable subsets 
A and B of R™ the integral over A and the integral over B are independent ( Adler and Taylor, 
|2007[ pp. 24-25). A matrix equation such as Equation pi) was derived for the SPDE in Equa- 
tion pi) with a finite volume method. The derivations are quite involved and technical and are in 
AppendixlX] However, when the form of the discretized SPDE has been derived as an expression 
of the coefficients in the SPDE and the grid, the conversion from SPDE to GMRF is automatic 
for any choice of coefficients and rectangular domain. 



3 Examples of models 

The simplest case of Equation pi is with constant coefficients. In this case one has an isotropic 
model (up to boundary effects) if H is a constant times the identity matrix or a stationary 
anisotropic model (up to boundary effects) if this is not the case. In both cases it is possible 
to calculate an exact expression for the covariance function and the marginal variance for the 
corresponding SPDE solved over K 2 . 
For this purpose write 

H\ H 2 

H 2 H 3 



H 



where Hi, H 2 and H3 are constants. This gives the SPDE 



#1 



dx 2 



2H-, 



O 2 



dxdy 



H, 



dy 2 



u{s)=W{s) 1 



s e 



(4) 



But if Ai and \ 2 are the eigenvalues of H, then the solution of the SPDE is actually only a 
rotated version of the solutions of 



dx 2 



2 dy 2 



u(a) = W(s), 



s e 



(5) 



Here the new x-axis is parallel to the eigenvector of H corresponding to Ai in the old coordinate 
system and the new y-axis is parallel to the eigenvector of H corresponding to A2 in the old 
coordinate system. 

From Proposition |B.1| one can see that the marginal variance of u is 



1 



1 



m 47TKV<fct(H) 47TK 2 VA7A^ 

One can think of the eigenvectors of H as the two principal directions and Ai and A2 as a measure 
of the "strength" of the diffusion in these principal directions. Additionally, if Ai = A2, which 



is equivalent to H being equal to a constant times the identity matrix, the SPDE is rotation 
and translation invariant and the solution is isotropic. If Ai 7^ A2, the SPDE is still translation 
invariant, but not rotation invariant, and the solutions are stationary, but not isotropic. 

In our case the domain is not R 2 , but [0, A] x [0,2?] with periodic boundary conditions. This 
means that the results are not exactly true, but as long as the correlation range is small compared 
to the size of the domain the results are approximately true in the centre. 

3.1 Stationary models 

For a constant H the SPDE in Equation ffib becomes 

[k 2 - V ■ HV]u(s) =W(s), se [0,A] x [0,2?]. 

This SPDE can be rewritten as 

[1 - V • HV]u(s) = ctWO), se[0,l] x [0,2?], (6) 

where H = H/k 2 and a = 1/k 2 . From this form it is clear that a is only a scale parameter and 
that it is enough to solve for u = 1 and then multiply the solution with the desired value of a. 
Therefore, it is the effect of H that is most interesting to study. 
It is useful to parametrize H as 

H = 7 I 2 + f3v(8)v(8f , 

where v(8) = [cos(6), sin(6*)] T . In this parametrization one can think of 7 as the coefficient of 
the second order derivative in the direction orthogonal to v(9) and 7 + (3 as the coefficient of 
the second order derivative in the direction v(ff). Ignoring boundary effects, 7 and 7 + /? are the 
coefficients of the second order derivatives in Equation ([5]) and is how much the coordinate 
system has been rotated in positive direction. 

Example 3.1 (Stationary GMRF). The purpose of this example is to consider the effects of 
using a constant H. Use the SPDE in Equation (pi) with domain [0, 20] x [0, 20] and periodic 
boundary conditions, and discretize with a regular 200 x 200 grid. Two different values of 
H are used, an isotropic case with H = I2 and an anisotropic case with 7 = 1, (3 = 8 and 

= 7r/4. The anisotropic case corresponds to a coefficient 9 in the cc-direction and a coefficient 

1 in the y-direction, and then a rotation of 7r/4 in the positive direction. The isotropic GMRF 
has marginal variances 0.0802 and the anisotropic GMRF has marginal variances 0.0263. For 
comparison Proposition |B . 1 1 gives 0.0796 and 0.0263. 

Figure [I] shows one realization for each of the cases. Comparing Figure 1(a) and Figure 1(b) | 



it seems that the direction with the higher coefficient for the second-order derivative has longer 
range and more regular behaviour. Compared to the corresponding partial differential equation 
(PDE) without the white noise, this is what one would expect since large values of the coefficient 
penalize large values of the second order derivatives. One should expect that the correlation 
range increases when the coefficient is increased. 

This is in fact what happens. Figure [2] shows the correlation of the variable at (9.95,9.95) 
with every other point in the grid for the isotropic and the anisotropic case. This is sufficient 
to describe all the correlations since the solutions are stationary. One can immediately note 
that the iso-correlation curves are close to ellipses with semi-axes along v(8) and the direction 
orthogonal to v(6). One can see that the correlation decreases most slowly and most quickly 
in the directions used to specify H, with slowest decrease along v(9). It is interesting to see 
that both the isotropic case and the non-isotropic case has approximately the same length for 
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Figure 1: (a) Realization from the SPDE in Example 3.1 on [0, 20] 2 with a 200 x 200 grid and 

0. [(b)] Realization from the SPDE 



periodic boundary conditions with 7 = 1, ft = and 9 

in Example 3.1 on [0, 20] 2 with a 200 x 200 grid and periodic boundary conditions with 7 = 1, 



(3 = 8 and 9 = tt/4. 



the minor semi-axis of the iso-correlation curves, and that the major semi-axis is longer for the 
anisotropic case. This is due to the fact that the lengths of the semi-axes are connected with y/j 
and V7 + /?■ 

From the example above one can see that the use of 3 parameters allow for the creation of 
GMRFs which are more regular in one direction than the other. One can use the parameters 7, j3 
and 9 to control the form of the correlation function and a to get the desired marginal variance. 



3.2 Non-stationary models 

To make the solution of the SPDE in Equation fcty non-stationary, either n 2 or H has to be a 
non-constant function. One way to achieve non-stationarity is by choosing 

H(s)= 7 h + (3v(s)v(s) T , 

where v is a non-constant vector field on [0, A] x [0, B] which satisfy the periodic boundary 
conditions and 7 > and j3 > are constants. 

Example 3.2 (Non-stationary GMRF). Use the domain [0, 20] 2 with a 200 x 200 grid and 
periodic boundary conditions for the SPDE in Equation (pi). Let k 2 be equal to 1 and let H be 
given as 

H(s)= 7 I 2 + Pv(s)v(s) T , 

where v is a 2-dimcnsional vector held on [0, 20] 2 which satisfies the periodic boundary conditions 
and 7 > and /3 > are constants. 

To create an interesting vector held, start with the function / : [0, 20] 2 — > R dchned by 



f(x,y) 



sin(27rx/20) + - sin(27n//20) 



Then calculate the gradient V/ and let v : [0, 20] 2 — > M 2 be the gradient rotated 90° counter- 
clockwise at each point. Figure 3(a) shows the values of the function / and Figure [3(b) | shows 
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Figure 2: (a) Correlation of the centre with all other points for the solution of the SPDE in 
Example 3.1 on [0, 20] 2 with a 200 x 200 grid and periodic boundary conditions with 7=1, 
ft = and = 0. |(b)| Correlation of the centre with all other points for the SPDE in Example |3.1 



on [0, 20] with a 200 x 200 grid and periodic boundary conditions with 7 = 1, /? = 8, 6 — 7r/4. 



the resulting vector field v. The vector field is calculated on a 400 x 400 regular grid, because 
the values between neighbouring cells in the discretization is needed. 

Figure 4(a)| shows one realization from the resulting GMRF with 7 = 0.1 and /3 = 25. A 
much higher value for j3 than 7 is chosen to illustrate the connection between the vector field 
and the resulting covariance structure. From the realization it is clear that there is stronger 
dependence along the directions of the vector field shown in Figure |3(b)| at each point than in 
the other directions. In addition, from Figure |4(b)| it seems that positions with large values for 
the norm of the vector field has smaller marginal variance than positions with small values and 
vice versa. 

From Figure [5] and Figure [6] one can see that the correlations depend on the direction and 
norm of the vector field, and that there is clearly non-stationarity. Figure 6(a) and Figure 



) 



show that the correlations with the positions (4.95,1.95) (4.95,7.95) tend to follow the vector 
field around the point (5,5), whereas Figure [6(b)| and Figure [6(d)| show that the correlations 
with the positions (14.95,1.95) and (14.95,7.95) tend to follow the vector field away from the 
point (15,5). Figure 6(e) shows that the correlations with position (4.95,4.95) and every other 
point is not isotropic, but concentrated close to the point itself, and Figure [6(f)] shows that the 
correlations with position (14.95,4.95) have high correlation along four directions which extends 
out from the point. Figure [5] shows that the correlations with position (9.95,9.95) "follow" the 
vector field with high correlations in the vertical direction. 

From this example one can see that allowing H to be non-constant means that one can vary the 
dependence structure in more interesting ways than the stationary anisotropic fields. Secondly, 
using a vector field to control how H varies means that the resulting correlation structure can 
be partially visualized from the vector field. Thirdly, when 7 > this construction guarantees 
that H is everywhere positive definite. 
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(a) The function used to create the vector field. 
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(b) The resulting vector field. 



Figure 3: The gradient of the function illustrated in |(a)| is calculated and rotated 90° counter- 
clockwise at each point to give the vector field illustrated in |(b)| 
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Figure 4: One observation and the marginal variances of the solution of the SPDE in Equation pi) 
on a 200 x 200 regular grid of [0, 20] 2 with periodic boundar y con ditions, k 2 = 1 and H = 
O.II2 + 25vv T , where v is the vector field described in Example 3.2 
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Figure 5: Correlations with position (9.95, 9.95) and all other points for the solution of the SPDE 
in Example |3.2| 



4 Inference 

This section begins with a discussion of the parametrization of the model and the derivation 
of the posterior distribution. Then the properties of the inference scheme is discussed through 
some examples on simulated data. 



4.1 Posterior distribution and parametrization 

The first step for inference is to introduce parameters that control the behaviour of the coefficients 
in Equation (pi) and in turn the behaviour of the GMRF. The way this is done is by expanding 
each of the functions in a basis and use a linear combination of the basis functions weighted by 
parameters. For k 2 only one parameter, say di, is needed as it is assumed constant, but for the 
function H a vector of parameters 9 2 is needed. Set 9 = (9i,9 2 ) and give it a prior 9 ~ ?r(0). 
Then for each value of 0, the discretization in Appendix |A.3| is used to construct the GMRF 
u\9 ~ jV(0, Q(0) _1 ). Combine the prior of 9 with this conditional distribution to find the joint 
distribution of the parameters and u. Together with a model for how an observation y is made 
from the underlying GMRF this forms a hierarchical spatial model. The relationship between 
y and u is chosen to be particularly simple, namely that linear combinations of u are observed 
with Gaussian noise, 

y\u ~N (Au.Qh 1 ), 

where Qn is a known precision matrix. 

The purpose of the hierarchical model is to do inference on 9 based on an observation of 
y. This requires the posterior distribution of 9 based on an observation of y. The first step in 
calculating this distribution is to calculate the probability density of u\y, 9, 

n(u\y,9) oc n(u,y\9) 

= ir(u\9)Tr(y\u,9) 

oc exp ( -- [u T Q{9)u + {y - A«) T Q N (y - Ait)] 

oc exp (-- [m t (Q(6») + A t Q n A)m - 2M T A T Q N y] ) . 
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(b) Correlations with position (14.95,2.05). 
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(c) Correlations with position (4.95,7.95). 
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Figure 6: Correlations for different points with all other points for the solution of the SPDE in 
Example 3.2 



If 



Denote Qc(0) = Q(#) + A T Q N A and /x c (0) = Q c (6»)- 1 A T Q N y, then 

uly.e^JVC/icW.QcW 1 ). 

The next step is to integrate out u from the joint density of u, y and 6 using the Bayes' 
theorem, 

n{u\y,6) 

^ ir(9)Tr(u\9)7r(y\u,Q) 
n(u\y,0) 

Since the left-hand side of the equation does not depend on u, neither can the right-hand side 
of the equation. Evaluating the right-hand side at u = gives 



n(9\y) oc 



|Q(0)nQ N |V2 ^ i 



x ex P (A [(V - AO) T Q N (y - AO)] 



x exp (+i [(0 - Mc (0)) T Q c (0)(O - Mc (0))] 
Or in other words 

l0g(7T(%)) = 

Const + bg(7r(fl)) + ilog(|Q(0)|) 

- i log(lQcWI) + ^ c (0) T Qc(0)McW- (7) 

From the above expression one can see that the posterior distribution of 6 contains terms 
which arc hard to handle analytically. It is hard to say anything about both the determinants 
and the quadratic term as functions of 6. Therefore, the inference is done numerically. The 



model is on a form which could be handled by the INLA methodology (Rue et al., 2009), but 
at the time of writing the R-INLA softwarqj does not have the model implemented. Instead the 
parameters are estimated with maximum a posteriori estimates based on the posterior density 
given in Equation |7|. In addition, the standard deviations are estimated from the square roots 
of the diagonal elements of the observed information matrix. 
The decomposition of H introduced in Section [5J 

HO) = 7I2 + v(s)v{s) T , 

needs to be controlled by a finite number of parameters. The simple case of a constant matrix 
H requires 3 parameters. Use parameters 7, v± and V2 and write 



H(s) = 7 I 2 



Pi v 2 \ 



1 www . r- inla . org 
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If H is not constant, it is necessary to parametrize the vector field v in some manner. Any vector 

field is possible for v, so a basis which can generate any vector field is desirable. 

Let the domain be [0, A] x [0, B] and assume that v is a differentiable, periodic vector field 

on the domain. Then each component of the vector field can be written as a Fourier series of the 

form 

,'k I 

2m | —x + — y 



^ C k ,i exp 

(M)6Z 2 



A 



B' 



where i is the imaginary unit. But since the components are real-valued, each of them can also 
be written as a real 2-dimcnsional Fourier series of the form 



E 

(k,l)eE 



Aki cos 



2tt I —x- 
A 



I 
B ! 



Aom 



where the set E C Z 2 is given by 

£=(NxZ)U ({0} x 
Putting these Fourier series together gives 



Bui. sin 



2tt 



A' 



I 

— 'i 
B' 



v(s) 



E 

(k,i)eE 



E 

(k,l)£E 

(1)' 



2tt 



A' 



B- 



2tt 



A X 



B V 



(8) 



where A\ \ and B k j are the coefficients for the first component of v and A\ I and B k j are the 
coefficients of the second component. This gives 2 coefficients when only the zero- frequency is 
included, then 18 parameters when the (0, 1), (1, —1), (1,0) and (1, 1) frequencies are included. 
When the number of frequencies used in each direction doubles, the number of required param- 
eters quadruples. 

4.2 Inference on simulated data 

In this section we consider data generated from a known set of parameters. The prior used is an 
improper prior that disallows illegal parameter values. It is uniform on (0, oo) for 7 and uniform 
on M. for the rest of the parameters in H. The first example presents the simplest case of exactly 
observed data and constant coefficients. 



Example 4.1. Use the SPDE 

u(s) - V • HVu(s) 



W{s), s e [0, 20] x [0, 20] 



(9) 



where W is a standard Gaussian white noise process and H is a 2 x 2 matrix, with periodic 
boundary conditions. Let 

H = 3I 2 + 2vv T , 

with v = (1, a/3)/2. This means that H has eigenvector v with eigenvalue 5 and an eigenvector 
orthogonal to v with eigenvalue 3. Construct the GMRF on a 100 x 100 grid. 
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Realization 




Figure 7: One realization of the solution of the SPDE in Example |4.1 



Tabic 1: Parameter estimates for Example 4.1 



Parameter 


True value 


Estimate 


Std.dev 


7 


3 


2.965 


0.070 


V\ 


0.707 


0.726 


0.049 


V-2 


1.225 


1.231 


0.039 



One observation of the solution is shown in Figure [7] Assume that the fact that H is constant 
is known, but that its value is not. Then using the decomposition from the previous sections one 
can write 

Vl 

V2 



H = 7 I 2 



Vl v 2 



where 7, v± and v 2 are the parameters. Since the process is assumed to be exactly observed, we 
can use the distribution 9\u. This gives the posterior estimates shown in Tablefl] From the table 
one can see that all the estimates are accurate to one digit, and within one standard deviation 
of the true value. Actually, this decomposition of H is invariant to changing v with —v, so there 
are two choices of parameters that means the same. 

In the example above the estimation works. But this is under the assumption that it is known 
beforehand that the coefficients are constant. In general this seems unreasonable. Therefore, the 
same situation is considered again without this assumption. 



Example 4.2. Use the same SPDE and observation as in Example |4.1( but assume that it is 
not known that H is constant. Add the terms in the Fourier series corresponding to the next 
frequencies, (k, 1) — (0,1), (k, I) = (1,-1), (k, I) = (1,0) and (k,l) = (1,1). The observation is 
still assumed to be exact, but there are 16 additional parameters, 4 additional parameters for 
each frequency. 

First two arbitrary starting positions are chosen for the optimization. The first is 7 = 3.0 
and all other parameters at 0.1. And the second is 7 = 3.0, A ' = 0.1, A Q q = 0.1 and all 
other parameters equal to 0. For both of these starting points the optimization converges to 
non-global maximums. Parameter estimates and approximate standard deviations are not show, 
but Figure [8] shows the two different vector fields found. 

A third optimization is done with starting values close to the correct parameter values. This 
gives a vector field close to the actual one, with estimates for 7, A q and A q that agree with 
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Inferred vector field 



Inferred vector field 
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(a) Wrong maximum. 



(b) Wrong maximum. 



Figure 8: Two different local maxima found for the vector field. The vector field in 
lower value for the posterior distribution of the parameters than the vector field in (b) 
has lower value than the actual maximum. 



(a) has a 
and both 



the ones in Example 4.1 to two digits. The other frequencies all had coefficients close to zero, 
with the largest having an absolute value of 0.058. 

Comparing the two vector fields in Figure [5] with a constant vector field, it is clear that there 
are major differences. In this example the problem is that there are at least three maxima for the 
posterior (actually at least 6, due to symmetry). It is hard to avoid these alternative maxima, 
but part of the problem is likely to be that there are 19 parameters, but only one observation. 

The example shows that introducing many parameters makes the optimization much harder, 
but that when starting close to the actual parameter values one gets an estimate close to the 
actual vector field. In addition, there is a large increase in computation time when increasing 
the parameter space to be 19-dimensional, compared to a 3-dimensional parameter space. The 
computation time required is increased by a factor of 10. 

Example 4.3. Use a 100 x 100 grid of [0, 20] 2 and periodic boundary conditions for the SPDE 
in Equation (pi). Let n 2 be equal to 1 and let H be parametrized as 



where v is the vector field from Example |3.2| 

Figure [9] shows one observation of the solution with 7 = 0.5 and (3 — 5. In this case one 
expects that it is possible to make accurate estimates about 7 and /3 as the solution is observed at 
all grid points. The inference is done with the posterior distribution 0\y ,with Qn = Iioooo/V 2 j 
where a 2 — 10~ 6 , and A = iioooo- In comparison the smallest marginal variance of x\6 is 0.06. 
This means all values are assumed to be observed nearly exactly. 

The estimated parameters are shown in Table[5J From the table one can see that the estimates 
for both 7 and /3 are quite accurate, which is reflected both in the actual value of the estimates 
and the approximated standard deviations. The estimates for both 7 and /3 are accurate to 2 
digits. 

The example above shows that when using only the 7I2 term and one vector field for H, the 
estimates for the parameters are quite accurate. The accuracy of the estimates for f} and 7 will 
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Observation 




Figure 9: An observation of the SPDE in Equation pi) on a 100 x 100 regular grid of [0, 20] 2 
with periodic boundary conditions, k 2 = 1 and H(s) = O.5I2 + 5v(s)v(s) T , where v is the vector 
field in Example |3.2| 



Table 2: Posterior inference on parameters in Example |4.3| 
Parameter True value Estimate Std.dev. 

7 0.5 0.5012 0.0081 

8 5 5.014 0.084 



of course depend on the vector field used. In a more realistic situation the actual components 
of the vector field is not known and there is observation noise. In the following example only a 
subset of the required frequencies of the Fourier basis is included. 

Example 4.4. Use a 100 x 100 grid of [0, 20] 2 and periodic boundary conditions for the SPDE 
in Equation p|. Let n 2 be equal to 1 and let H be given as 



where v is the vector field 



v(x,y) = 



n(s)=I 2 + v(s)v{s)' ] 



\w x ) 



3 + 2am(^y)+sm(^(x + y)) 



One observation with i.i.d. Gaussian noise with precision 400 is shown in Figure |10| Based 
on this realization it is desired to estimate the correct value of 7 and the correct vector field v 
in the parametrization 

H(s) = 7I2 + v(s)v{s) T . 

First use only one extra frequency in each direction, that is only the frequencies (0,0), (0,1) 
and (1,0). This gives the estimated vector field shown in Figure 11(a) Then add the missing 
frequency and use the frequencies (0,0), (0, 1), (1,0) and (1, 1). This gives the estimated vector 



field shown in Figure 11(b) The true vector field is shown in Figure 11(c) 



Both estimated vector fields are quite similar to the true vector field, and the 7 parameter 
was estimated to 1.09 in the first case and 0.99 in the latter case. All parameter values were 
estimated, but are not shown. For the first case many parameters is more than two standard 



1G 



Observation 



>.10 




Figure 10: An observation of the SPDE in Example |4.4| with i.i.d. Gaussian white noise with 
precision 400. 



deviations from their correct values and in the second case this only happens for one parameter. 
For each case the difference between the true H and the estimated H is calculated through 



1 
100 



\ 



100 100 

EE|| H Ki)- ft Ki) 

»=1 J=l 



where s,j are the centres of the cells in the grid and || • \\2 denotes the 2-norm. The case with 
frequencies (0,0), (0, 1) and (1,0) give 7.8 and the case with frequencies (0,0), (0, 1), (1,0) and 
(1,1) give 2.8. 

5 Discussion 

The paper explores different aspects of a new class of non-stationary GRFs based on local 
anisotropy. The benefit of the formulation presented is that it allows for flexible models with few 
requirements on the parameters. Since the GRF is based on an SPDE, there is no need to worry 
about how to change the discretized model in a consistent manner when the grid is refined. In 
other words, one does not need to worry about how the precision matrix must be changed to 
still give a similar covariance structure when the number of grid points is increased. This is one 
of the more attractive features of the SPDE-based modelling. It is also possible to do this type 
of modelling on more general domains through triangulations. This can be useful if the domain 
of interest is far from rectangular. But this requires a different boundary condition than the 
periodic boundary condition. But it is possible to use a Neumann type of boundary condition 
adapted to the operator V • HV. One cannot do a simple zero-derivative boundary condition 
because it does not stop "things" from leaving or entering the domain. 

The focus of the examples has been the matrix H introduced in the Laplace operator. The 
examples show that a variety of different effects can be achieved by using different types of 
functions. Constant matrices of the form 7I2 give isotropic random fields and other constant 
matrices give anisotropic, stationary random fields. As shown in Section [3] the anisotropic fields 
have anisotropic Matern covariance functions, through stretching and rotating the domain, that 
can be controlled by four parameters. One can control the marginal variances, the principal di- 
rections and the range in each of the principal directions. A non-constant H gives non-stationary 
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(a) (0,0), (0,1) and (1,0) frequencies 



(b) (0,0), (0,1), (1,0) and (1,1) frequencies 
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Figure 11: True vector field and inferred vector fields in Example 4.4 Each of the vector fields 
is scaled with a factor 0.3. 
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random fields. And by using a vector field to specify the matrix-valued function there is a con- 
nection between the vector field and the resulting covariance structure. The covariance structure 
can be partially visualized from the vector field. 

From the examples in Section [4] one can see that sensible values for the parameters are 
estimated both with and without noise, except for problems with multimodality in Example |4.2| 
which uses 19 parameters. The last example presents the most challenging case and is perhaps 
closest to a real scenario. In the example good results are achieved when estimating the vector 
field with only a subset of the frequencies required to fully describe it. 

There are many avenues that are not explored in this paper due to the fact that it is a first 
look into a new type of models. The chief motivation is to explore the class of models both in 
the sense of what can be achieved and simple tests of inference. It is of great interest to include 
non-constant k 2 and 7 in the future. When 7 is allowed to vary with position it is possible to vary 
the baseline effect at each location and when used together with n 2 it is possible to have more 
control over the marginal variances. Another important continuation for real-world modelling 
is the best choice of parametrization of the model and assignment of priors. It still remains to 
investigate appropriate choices of priors for applications. 
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A Derivation of precision matrix 

A.l Formal equation 

The SPDE is 

(k 2 (s)-V-H(s))Vu(s) =W(s), s£[0,A]x[0,B], (A.l) 

where A and B are strictly positive constants, k 2 is a scalar function, H is a 2 x 2 matrix-valued 

function, V = (g^, gM and W is a standard Gaussian white noise process. In addition, k 2 is 

assumed to be a continuous, strictly positive function and H is assumed to be a continuously 
diffcrentiable function which gives a positive definite matrix H(s) for each s £ [0,A] x [0, B]. 

Further, periodic boundary conditions are used, which means that opposite sides of the 
rectangle [Q,A] x [0,B] are identified. This gives additional requirements for k 2 and H. The 
values of k 2 must agree on opposite edges and the values of H and its first order derivatives must 
agree on opposite edges. The periodic boundary conditions are not essential to the methodology 
presented in what follows, but were chosen to avoid the issue of appropriate boundary conditions. 

A. 2 Finite volume methods 



In the discretization of the SPDE in Equation (A.l) a finite volume method is employed. The 



finite volume methods are useful for creating discretizations of conservation laws of the form 

V ■ F(x,t) = f(x,t), 

where V- is the spatial divergence operator. This equation relates the spatial divergence of the 
flux F and the sink-/source-term /. The main tool in these methods is the use of the divergence 
theorem 

jV-FdV=(b F-nda, (A.2) 

JE JdE 

where n is the outer normal vector of the surface dE relative to E. 

The main idea is to divide the domain of the SPDE in Equation ( A.l ) into smaller parts and 
consider the resulting "flow" between the different parts. A lengthy treatment of finite volume 
methods is not given, but a comprehensive treatment of the method for deterministic differential 
equations can be found in Eymard et al. (2000[). 



A. 3 Derivation 

To keep the calculations simple the domain is divided into a regular grid of rectangular cells. 
Use M cells in the x-direction and N cells in the y-direction. Then for each cell the sides parallel 
to the cc-axis have length h x = A/M and the sides parallel to the y-axis have length h y = B/N. 
Number the cells by (i,j), where i is the column of the cell (along the x-axis) and j is the row 
of the cell (along the y-axis). Call the lowest row and the leftmost column 0, then cell (i,j) is 

Eij = [ih x , (i + l)h x ] x [jh y , (j + l)h y ]. 

Using this notation the set of cells, I, is given by 

I = {E lJ : i = 0, 1, . . . , M - 1, j = 0, 1, . . . , N - 1}. 



Figure A.l shows an illustration of the discretization of [0, A] x [0, B] into the cells 2. 

Each cell has four faces, two parallel to the x-axis (top and bottom) and two parallel to the 
y-axis (left and right). Let the right face, top face, left face and bottom face of cell Eij be 
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Figure A.l: Illustration of the division of [0, A] x [0, B] into a regular M x N grid of rectangular 
cells. 
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Figure A. 2: One cell, Etj, of the discretization with faces of 1 -, ajp a\ '■ and <rf -, centroid s,-^ 
and centres of the faces S£_x/2,j) s i,i-i/2> s i+i/2,j an d Sij+i/2- 



denoted of^, u?., a\i and ct?,-, respectively. Additionally, denote by a(E i: j) the set of faces of 
cell Eij. 

For each cell Eij, Sij gives the centroid of the cell, and s i+1 / 2j -, Sjj+i/2! ■ s i-i/2.j an d s i j_ 1 / 2 
give the centres of the faces of the cell. Due to the periodic boundary conditions, the i-index 



and j-index in Sij are modulo M and modulo N, respectively. Figure A. 2 shows one cell Ei 



with the centroid and the faces marked on the figure. Further, let Uij = u(sj,j) for each cell and 
denote the area of Eij by V%a- Since the grid is regular, all Vij are equal to V = h x h y . 

To derive the finite volume scheme, begin by integrating Equation (A.l) over a cell, Eij. 
This gives 

/ K 2 (s)u(s)ds- V-H(s)Vu(s)ds= / W(s)ds, (A.3) 

where ds is an area element. The integral on the right hand side is distributed as a Gaussian 



variable with mean and variance V for each (i,j) (Adler and Taylor, 2007 pp. 24-25). Further, 
the integral on the right hand side is independent for different cells, because two different cells 



can at most share a common face. Thus Equation (A.3 1 can be written as 



I K 2 (s)u(s)ds- V-H(s)Vu(s)ds 

JE, i JEi t 



where Zi j is a standard Gaussian variable for each (i, j) and the Gaussian variables are indepen- 
dent. 



By the divergence theorem in Equation (A.2), the second integral on the left hand side can 



be written as an integral over the boundary of the cell. This results in 

/ k 2 (s)u(s) ds - <h (H(s)Vu(s)) T n(s)d<7 = Wzi d , 

JE i}j JdEij 

where n is the exterior normal vector of dEij with respect to E^j and da is a line clement. It is 



(A.4) 
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useful to divide the integral over the boundary in Equation (A.4| into integrals over each face, 



n 2 (s)u(s) ds - (W% + Wl 3 + W& + W&) = VVzij , (A.5) 



where W?? = J adir (U(s)Vu(s)) T n(s)da. 



The first integral on the left hand side of Equation (A.5 1 is approximated by 



k (s)u(s)ds = VK t jU(si j) — VttijUij, (A.6) 

where k 2 • = -k J E K 2 (s)ds. The function k 2 is assumed to be continuous and k 2 ■ is approxi- 
mated by n 2 (sij). 



The second part of Equation (A.5) requires the approximation of the surface integral over 
each face of a given cell. The values of H are in general not diagonal, so it is necessary to estimate 
both components of the gradient on each face of the cell. For simplicity, it is assumed that the 
gradient is constant on each face and that it is identically equal to the value at the centre of the 
face. On a face parallel to the y-axis the estimate of the partial derivative with respect to x is 
simple since the centroid of each of the cells which share the face have the same y-coordinate. 
The problem is the estimate of the partial derivative with respect to y. The reverse is true for 
the top and bottom face of the cell. 

It is important to use a scheme which gives the same estimate of the gradient for a given face 
no matter which of the two neighbouring cells are chosen. For the right face of Eij, that is <rf'-, 
the approximation used is 

d 1 

— U{s i+ i/ 2 ,j) ~ ^-00i + l/2,j+l/ 2 ) - u(s i+1 / 2J _i /2 )), 

where the values of u at s i+ i/ 2 j+i/2 an d ^+1/2^-1/2 are linearly interpolated from the values 
at the four closest cells. More precisely, because of the regularity of the grid the mean of the 
four closest cells are used. This gives 

d 1 

—u(s i+ i/ 2 ,j) « — (u l+ i J+ i + Uij+i - Uij-i - Ui+i t j-i). (A. 7) 

Note that this formula can be used for the partial derivative with respect to y on any face parallel 
to the y-axis by suitably changing the i and j indices. The partial derivative with respect to x 
on a face parallel to the y-axis can be approximated directly by 

d 1 

g^u(s i+1/ 2,j) « j^(u i+ i,j - Ui,j). (A.8) 

In more or less exactly the same way the two components of the gradient on the top face of 
cell Eij can be approximated by 

9 1 

-^- u \ s i,j + l/2) ~ Jj—\ u i+l,j+l + 1H+1J - Ui-l,j - Ui-l,j+l) 

and 
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Table A.l: Finite difference schemes for the partial derivative with respect to x and y at the different 
faces of cell Eij. 



Face 



c).c 



u(s) 



,) a 



u(s) 



ct r 


Ui + lJ—UiJ 




U t 


,j + l+Ui-t-l,j + l-Ui,j-l-U i+li j- 1 

4hy 






-«i_l, 3 + l 




ky 


°h 






U-t 


4h y 


°f, 


4h w 


-1—Ui-l.j 




1ii,j — Uij — i 



These approximations can be used on any side parallel to the x-axis by changing the indices 
appropriately. 

The approximations for the partial derivatives on each face are collected in Table [AT Using 
this table one can find the approximations needed for the second part of Equation (A.5). It is 
helpful to write 

Wfy=j (H(s)Vu{s)) T n(s)d<j= f (Vu( S )) T (H(s)n(s))dc7, 

Ja d " Jv d " 

where the symmetry of H is used to avoid transposing the matrix. Assuming that the gradient 
is identically equal to the value at the centre of the face, one finds 



W$n(Vu(c%)) 



H(s)n(s) da, 



where cf"' is the centre of face erf" . 

Since the cells form a regular grid, n is constant on each face. Let H be approximated by its 
value at the centre of the face, then 



W] 



dir 



m(a#) (V«(c*)) (H(c$)n(c$)) , 



(A.9) 



where m(af 1 f) is the length of the face. Note that the length of the face is either h x or h y and 
that the normal vector is parallel to the x-axis or the y-axis. 
Let 

"# n (s) H 12 (s) 
H 21 (s) H 22 {s)\ ' 

then using Table |A.1| one finds the approximations 

rK _ 



H( a ) 



Wf 



h„ 



H ll (s i+1/ 



a j J 



«i+i, 



h x 



H ^(Si+l/2,. 



, u i,j+l + M i + lj + l ~ u i,j-l ~ u i+l,j-l 

4th, 



W 



h.i 
h. 



H 22 (s hJ+1 / 2 )- 



.1M+l,j+l + u i+l,j ~ u i-l,j + l — u i-l,. 



AK 



21 



wt 



# U (^-l/2,j) 



U t -lJ 



**!J 



/?, r 



# 21 (s,-i/ 



2j/ 



, u i,j-l + Ui-ij-l — Mf-lJ+1 — Mij' + l 

4^ 



and 



TV' 



'j 



/'., 
/',, 






, Mj- lJ + Ui-l,j-l - u l+lj - Uj+l,j-l 

4/u 



These approximations can be combined with the approximations in Equation (A.6) and inserted 



into Equation (A.5) to give 



VkIju^-Iw^ + wZ + w^ 



W£-)=VV*ij. 



Stacking the variables Ui : j row-wise in a vector u, that is first row 0, then row 1 and so on, gives 
the linear system of equations, 



DyD K 2it — Ah« = D 



-nVa, 



(A.10) 



where T) v = V1 M N, D k 2 = diag(«g i0 , . ••,K 2 V f_i )0) «o,i!---' K M-i,iV-i)) z ~ ^Mn(0,Imn) and 
Ah is considered more closely in what follows. 

The construction of the matrix Ah, which depends on the function H, requires only that one 
writes out the sum 

w* + wl 3 + wl 3 + W?j 

and collects the coefficients of the different u 0) & terms. This is not difficult, but requires many 
lines of equations. Therefore, only the resulting coefficients are given. Fix (i,j) and consider the 
equation for cell Eij. For convenience, let i p and i n be the column left and right of the current 
column respectively and let j n and j p be the row above and below the current row respectively. 
These rows and columns are 0-indexed and due to the periodic boundary conditions one has, for 
example, that column is to the right of column M — 1. Further, number the rows and columns 
of the matrix Ah from to MN — 1. 

For row jM + i the coefficient of Uij itself is given by 

(A-ii)jM+i,jM+i = 

_^[tfi> i+1/2) .) + F"( Si _ 1/2iJ .)] 



-[^K^l+^f^-l 



/2j 
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The four closest neighbours have coefficients 

(A-n)jM+i,jM+i p = j^H (s l -i/2,j) - ^ [H (sij+1/2) - H (sij_i/ 2 )] 

(A H )jM+j,iM+j n = T^ff (*i+l/2,j) + I [# (Sij + l/2) - H (Si,j_l/ 2 )] 
(A H )jJW+i,i„JW+J = J^H 22 (s iJ + 1 /2) + - [H (S i+ i/2,j) - H («i_l/2,j)J 
(A H )jAf-H,j p M+t = T^-ff (Sij-1/2) - T [-ff (s»+l/2,j) - ^ 21 (Sj-l/2,j)] 

Lastly, the four diagonally closest neighbours have coefficients 

(A H )jM+i,j p M+i„ = +^ [ff u («y-i/s) + ^(^i-i/aj)] » 

(A H )iM+ij p M+t„ = -7 [# («i,j-l/2) + H («t+l/2,j)] > 
(A H )iM+i,j„M+i p = -7 [# (s. 1J + i/ 2 ) + -ff (Si-l/2,j)] , 
(A H )jM+«j n M+t„ = +^ [H (Sij + 1/2) +-ff (Si+l/2,3)] • 

The rest of the elements of row jM + i are 0. 



Based on Equation (A. 10 1 one can write 

z = D~ 1/2 A«, 
where A = DyD K 2 — Ah- This gives the joint distribution of u, 

tt(u) oc n(z) oc exp I — z T z 
n(u) ex exp I — u T A T D^ 1 Au 
n(u) ex exp I ——it Q« I , 

where Q = A T Dy 1 A. This is a sparse matrix with a maximum of 25 non-zero elements on each 
row, corresponding to the point itself, its 8 closest neighbours and the 8 closest neighbours of 
each of the 8 closest neighbours. 

B Marginal variances with constant coefficients 

Proposition B.l. Let u be a stationary solution of the SPDE 

K 2 u(x,y) - y-HVu(x,y) = W(x,y), (i,i/)el 2 , (B.l) 

where W is a standard Gaussian white noise process, K > is a constant, H is a positive definite 
2x2 matrix and V = ( ^ , -g- ) . 



Then u has marginal variance 



1 
o„ - 



47rKV det ( H )' 
2G 



Proof. Since the solution is stationary, Gaussian white noise is stationary and the SPDE has 
constant coefficients, the SPDE is acting as a linear filter. Thus one can use spectral theory to 
find the marginal variance. The transfer function of the SPDE is 

g(w) = - 



K 2 + w T Hw 

Further, the spectral density of a standard Gaussian white noise process on M 2 is identically 
equal to l/(2ir) 2 . It follows that the spectral density of the solution is 



fs(w) 



1\ 2 1 



2irJ (k 2 + w T Hw) 2 ' 
From the spectral density it is only a matter of integrating the density over R 2 , 

° m = / fs{w)dw. 

JR 2 

The matrix H is (symmetric) positive definite and, therefore, has a (symmetric) positive definite 
square root, say H 1 / 2 . Use the change of variables w = kH~ 1 ' 2 z to find 



Lwrhw dctiK ^ 1/2)dz 



a 2 = J- f ' 

m 4tt 2 



47r 2 K 2 ^det(H) Jr2 (1 + z T z) 2 
1 



4 7 TK 2 v / det(H) 



n 
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